Out-of-Domain (OOD) intent detection is important for practical dialog systems. To alleviate the issue of lacking OOD training samples, some works propose synthesizing pseudo OOD samples and directly assigning one-hot OOD labels to these pseudo samples. However, these one-hot labels introduce noises to the training process because some hard pseudo OOD samples may coincide with In-Domain (IND) intents. In this paper, we propose an adaptive soft pseudo labeling (ASoul) method that can estimate soft labels for pseudo OOD samples when training OOD detectors. Semantic connections between pseudo OOD samples and IND intents are captured using an embedding graph. A co-training framework is further introduced to produce resulting soft labels following the smoothness assumption, i.e., close samples are likely to have similar labels. Extensive experiments on three benchmark datasets show that ASoul consistently improves the OOD detection performance and outperforms various competitive baselines.
translated by 谷歌翻译
对心理健康支持的需求不断增长,强调了对话代理在全球和中国作为人类支持者的重要性。这些代理可以增加可用性并降低心理健康支持的相对成本。提供的支持可以分为两种主要类型:认知和情感支持。关于该主题的现有工作主要集中在采用认知行为疗法(CBT)原理的构造药物上。此类代理根据预定义的模板和练习来运行,以提供认知支持。但是,使用此类药物对情绪支持的研究是有限的。此外,大多数建设的代理商都以英语运作,强调了在中国进行此类研究的重要性。在这项研究中,我们分析了表情符疾病在减少精神痛苦症状方面的有效性。 Emohaa是一种对话剂,通过基于CBT的练习和指导性对话提供认知支持。它还通过使用户能够发泄所需的情绪问题来支持情感上的支持。该研究包括134名参与者,分为三组:Emohaa(基于CBT),Emohaa(Full)和控制。实验结果表明,与对照组相比,使用Emohaa的参与者在精神困扰症状方面的改善得到了更大的改善。我们还发现,添加情感支持剂对这种改善,主要是抑郁和失眠有互补的影响。根据获得的结果和参与者对平台的满意,我们得出结论,Emohaa是减少精神困扰的实用和有效工具。
translated by 谷歌翻译
具有终身学习能力(LL)能力的质量检查模型对于实用的质量检查应用很重要,据报道,基于架构的LL方法是这些模型的有效实现。但是,将以前的方法扩展到质量检查任务是不平凡的,因为它们要么需要在测试阶段访问任务身份,要么不会从看不见的任务中明确对样本进行模拟。在本文中,我们提出了Diana:一种基于动态体系结构的终生质量检查模型,该模型试图通过迅速增强的语言模型来学习一系列QA任务。戴安娜(Diana)使用四种类型的分层组织提示来捕获来自不同粒度的质量检查知识。具体而言,我们专门介绍任务级别的提示来捕获特定任务的知识,以保留高LL性能并维护实例级别的提示,以学习跨不同输入样本共享的知识,以提高模型的概括性能。此外,我们专用于单独的提示来明确建模未看到的任务,并引入一组及时的密钥向量,以促进任务之间的知识共享。广泛的实验表明,戴安娜(Diana)的表现优于最先进的终身质量检查模型,尤其是在处理看不见的任务时。
translated by 谷歌翻译
在只有有限的数据可用的低资源场景中,自然语言处理(NLP)的建立模型(NLP)具有挑战性。基于优化的元学习算法通过适应良好的模型初始化来处理新任务,从而在低资源场景中实现了有希望的结果。尽管如此,这些方法遭受了记忆过度拟合问题的困扰,在这种情况下,模型倾向于记住元训练任务,而在适应新任务时忽略了支持集。为了解决此问题,我们提出了一种内存模仿元学习(MEMIML)方法,该方法增强了模型对任务适应的支持集的依赖。具体来说,我们引入了一个特定于任务的内存模块来存储支持集信息并构建一个模仿模块,以强制查询集,以模仿存储在存储器中的某些代表性支持集样本的行为。提供了一种理论分析来证明我们方法的有效性,经验结果还表明,我们的方法在文本分类和生成任务上都优于竞争基准。
translated by 谷歌翻译
预先训练的模型已经证明是强大的增强面向任务的对话系统。但是,目前的预训练方法主要关注增强对话的理解和生成任务,同时忽略对话策略的开发。在本文中,我们提出了一个小说预先训练的对话模型,明确地通过半监督学习明确地从有限标记的对话框和大规模未标记的对话框中学习对话策略。具体而言,我们在预训练期间介绍一个对话框预测任务,以便在预训练中进行策略优化,并使用一致性正则化术语在未标记的对话的帮助下优化学习的表示。我们还实施了一个浇注机制来称量合适的未标记对话框样本。经验结果表明,星系大大提高了面向任务为导向的对话系统的性能,并在基准数据集中实现了新的最先进结果:车载,多种多纤2.0和多纺,改善其端到端合并分数2.5,5.3和5.5分。我们还显示Galaxy比各种低资源设置下的现有模型更强大的少量射击能力。
translated by 谷歌翻译
具有预先训练的语言模型(PRLM)的无监督域适应(UDA)已经取得了有希望的结果,因为这些预先训练的模型嵌入了从各个领域学习的通用知识。但是,微调PRLM的所有参数在小型域的特定语料库上扭曲了学习的通用知识,并且为每个域部署整个细小的PRLM也是昂贵的。本文探讨了无监督域适应的基于适配器的微调方法。具体地,在PRLM中插入了几个可培训适配器模块,通过在微调时固定原始PRLM的参数来保留嵌入的通用知识。引入域融合方案以利用混合域组件培训这些适配器,以更好地捕获可转移特征。执行了两个基准数据集的详细实验,结果表明我们的方法是有效的,不同的任务,数据集大小和域相似性。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译